Data Simulation

Simulating networks

We adopt a top-down approach to simulate hierarchical networks, considering various simulation parameters such as graph sparsity, noise, and the architecture of the super-level graph(s), including small-world, scale-free, and random graph networks (Watts and Strogatz 1998; Barabási and Bonabeau 2003).

Our simulations focus on basic hierarchies comprising one or two hierarchical layers. Two-layer networks mirror classical community detection on graphs, where our aim is to recover the true community labels from a given graph. Meanwhile, three-layer networks present a more intricate scenario, where the bottom layer of the hierarchy contains two levels of community structure. Here, the top level corresponds to the nodes at the uppermost layer of the hierarchy, and the middle level consists of communities nested within the top-level communities. The objective with these networks is to identify both sets of community partitions.

In each hierarchy, for fully connected networks, we initiate by simulating \(n_{\text{top}}\) top-level nodes, adhering to a directed small-world, random graph, or scale-free network architecture (Watts and Strogatz 1998; Barabási and Bonabeau 2003). In cases where the network is disconnected, we simply simulate \(n_{\text{top}}\) disconnected nodes. For networks with three hierarchical layers, we then generate a subnetwork of \(n_{\text{middle}}\) nodes from each top-layer node, adhering to the network structure utilized at the top level. If the network is fully connected, we apply a probability \(p_\text{between}\) to the nodes from different top-level communities being connected.

The final step in all hierarchies is to generate the nodes in the observed (bottom) layer of the hierarchy. For each top-layer or middle-layer node, we generate a subnetwork of \(n_{\text{bottom}}\) nodes under the same subnetwork structure as the previous layers, and we apply a probability \(p_\text{between}\) for nodes from different communities to share an edge.

Simulating gene expression

Once we simulate a hierarchical graph, we utilize this hierarchy to generate the node-feature matrix, which depicts the expression of \(N\) genes across \(p\) samples. Here, \(N\) denotes the number of nodes in the observed (bottom) layer of the hierarchy, and its range is governed by \(a^{\ell+1}<N<a\times b^\ell\), where \(\ell\) signifies the number of hierarchical layers.

We simulate the node-feature matrix using the topological order the observed level graph. We start by generating the features of nodes that have no parental input. We refer to these nodes as origin nodes. All origin nodes are simulated from a normal distribution with mean \(0\) and standard deviation \(\sigma\). All other nodes are simulated from a normal distribution centered at the mean of their parent nodes and with standard deviation \(\sigma\).

Hierarchical Commuity Detection (HCD) Overview

Our HCD method consists of two primary components:

  1. A graph autoencoder based on the architecture proposed by Salehi and Davulcu (2019) which utilizes graph attention layers such as those first indroduced by Veličković et al. (2017) (See most recent version of pseudocode for details). In our applications, we incorporate multi-head attention in all encoder and decoder layers to expand model learning capacity. The graph autoencoder module takes a set of node attributes and an adjacency matrix defining the relationships between the node as input and learns a low dimensional embedding of the network and attributes. This embedding is then used to reconstruct the node attributes and adjacency matrix under a separate loss function for each.

  2. The second component of HCD takes the embeddings generated by the autoencoder and applies a multilevel community detection process. This module is composed of \(l\) fully connected layers, each representing a level in the hierarchy. Each layer’s goal is to group the \(k_{i-1}\) nodes from the previous level into \(k_i\) communities. The number of layers in the hierarchy and the number of communities at each level are predefined parameters that need to be determined through other methods. In our applications, we use the simulation truth for each parameter so that the communitiy detection module consists of \(2\) layers where the first layer assigns the nodes at the bottom layer to the true number of communities at the middle layer. The second community detection layer assigns the nodes in the middle layer to communities in the top layer.

Datasets

We consider three sets of hierarchical networks which represent varying difficulty levels for inference:

  1. Complex networks - used for final simulation assessment - Table 1.1 - 1.3 .

  2. Intermediate networks - used for investigative model tuning and performance assessment - Table 1.4 .

  3. Simple networks - used for code implementation and debugging - Table 1.5.

Application to Intermediate Networks

A comprehensive overview of the intermediate networks is presented in Table 1.4. These networks are structured as three-layered systems, each characterized by small-world, scale-free, or random graph architectures. In contrast to the more intricate networks featured in the Complex Networks dataset, the intermediate networks exhibit a comparatively simpler configuration. Specifically, each network comprises \(5\) super layer nodes, \(15\) middle layer nodes, and \(300\) bottom layer nodes. Our primary focus in utilizing this dataset is to examine the performance of the Hierarchical Community Detection (HCD) method when applied to three-layer networks. The smaller scale of these networks facilitates a more in-depth analysis of the detected communities within the middle and upper layers of their hierarchical structures.

We apply the HCD method to each network separately using three options for the input graph corresponding to the nodes at the observed layer of the hierarchy:

We also explore various combinations of weighting the loss function across each of the aforementioned input graphs. In all cases, we ensure that the predicted number of communities in the middle or top levels of the hierarchy aligns with the ground truth of the simulation.

Evaluating performance

We evaluate the performance of our HCD method using three graph-based clustering metrics:

  1. homogeneity evaluates the degree to which each predicted community contains only data points from a single true community, indicating how well the algorithm avoids mixing different groups. Thus, homogenity tends to be high if resolved communities contain only members of the same true community.

  2. completeness assesses the extent to which all data points that belong to the same true community are correctly assigned to a single predicted community. Thus completeness is always high if all members of the same true communities end up in the same resolved community even if several true communities are allocated together.

  3. NMI is a weighted average of the previous two metrics.

For each simulation, we configure the number of communities in the middle and upper layers of the hierarchy to match the true count in each layer. Then, we evaluate the community predictions of the Hierarchical Community Detection (HCD) algorithm at these levels against the actual communities using three metrics. As a baseline, we employ the Louvain method, which utilizes hierarchical graph partitioning to maximize modularity, resulting in a single set of resolved communities. These resolved communities may align with the middle, upper, or a combination of both layers in the true hierarchy. Thus, we compute the performance metrics of the communities identified by the Louvain method against the true communities at both the upper and middle levels of the hierarchy.

Examples

In examples 1-10 we use the true graph as the input graph and we assess the quality of performance primarily on the ability to infer communities for the top layer.

Example 1-3 Examples 1-2 illustrate the impact of the attribute reconstruction loss. Example 3 . Specifically, Example 1 demonstrates the negative effects of overemphasizing the attribute reconstruction loss by setting its parameter too high, which dilutes the impact of graph reconstruction. This imbalance leads to inaccuracies in the reconstructed graph. Example 2 shows that when graph reconstruction is given equal priority to other loss components, HCD can resolve the correct communities. In Example 3, the modularity loss component is turned off by setting its value to zero to investigate the importance of modularity loss component on performance.

In all three examples, we apply HCD to the 3-layer, disconnected, small-world network in the intermediate networks dataset. In addition, all three examples use the same number of autoencoder layers, settings for clustering loss component, and 10 attention heads. All parameter settings for these examples are detailed in table 1.6. Specific outcomes for each example are discussed in greater detail in the Results section.

Examples 4-5

Results

Previously we applied our HCD method to all complex and intermediate datasets looking at various parameters such as the input graph and some initial test values for loss hyperparameters. Our previous findings are outlined in the lab report from 3/13/2024 and pointed out several issues that remain to be addressed. Below, we address these points in detail and provide supporting examples from applying HCD to networks in the Intermediate Networks dataset (Table 1.4).

An additional loss component is needed to help further reduce the tendency of HCD to combine smaller communities into super-communities.

  • We have since implemented a hierarchical adaptation of the Kmeans approach which aims to ensure members clustered to the same community have the smallest possible dissimilarity. See the latest update of the HCD pseudo code to view the specific mathematical details.

How should we assess performance when comparing the Louvain method and our HCD method?

  • Louvain uses a heuristic hierarchical community partitioning approach to find the communities that maximize modularity within a local greedy search of the space of possible partitions. The predicted communities are the final community assignments that arise when no additional assignment changes can improve the modularity - when a local optimum of modularity has been achieved. This leads to a single set of community predictions which may represent either the top layer of the true hierarchy or it may represent one of the middle layers. It may also be a blend of middle and top layers. As a result, we will compare the predictions from the Louvain method to the truth for both the middle and top layer communities. For our HCD method, we set the community detection module to mirror the simulation truth (i.e 2 layers corresponding to \(15\) middle layer communities and \(5\) top layer communities). Therefore, HCD provides two sets of community predictions corresponding to the two upper layers in the hierarchy. However, there is no assurance that the predicted communities will accurately align with the hierarchical structure they’re intended to represent. To evaluate performance, we compare the predicted middle-layer communities against the true community assignments for both the middle and top layers. Additionally, we compare the predicted top-layer communities to the true top-layer assignments. This approach generates three sets of performance metrics, which we can then qualitatively compare against the results from the Louvain method.

Settings for tuning parameters

Examples 1 - 2; When the attribute reconstruction loss is too large: To ensure that the loss for graph reconstruction isn’t overshadowed, we find it necessary to downweight the tuning parameters for the attribute reconstruction, modularity, and clustering loss components. We demonstrate this by applying HCD to a disconnected small-world intermediate network, using the true adjacency matrix as input. In this network, the communities at the top layer are completely isolated from each other. If the attribute reconstruction loss is set too high, the model can generate artificial connections in the graph. As shown in Figure 2.1, HCD has divided the original graph into two separate subgraphs by creating false links between the first three communities and the last two. This results in distinct blocks in the reconstructed adjacency matrix. Additionally, in Figure 2.2, you can see that these false connections cause the affected nodes to be incorrectly assigned to the same top-level communities. This occurs because the high attribute reconstruction loss encourages the model to form connections that aid in attribute prediction, even if they do not reflect the true graph structure (Figure 2.2). The PCA plot in Figure 2.4 indicates that the model is essentially grouping nodes based on their similarity along the first two principle dimensions of attribute space, which leads to these inaccurate connections. To correct this, we downweight the tuning parameter for the attribute reconstruction component, giving graph reconstruction a higher priority during training (Figure 2.5). This adjustment leads to a more accurate representation of the adjacency matrix and better predictions for the top-layer communities (Figures 2.6 - 2.7 ; Table 1.7).

Modularity is necessary: Our HCD method uses two loss components to assess the quality of inferred communities: (i) modularity loss and (ii) clustering loss. These two components approach community structure from different angles. Modularity loss evaluates the quality of clusters based on the input graph, ensuring that communities have more internal connections than external ones. Clustering loss, however, focuses on node features and aims to form communities where the nodes are as similar as possible, measured by their Euclidean distance to the community centroid.

In Example 3, we set the modularity tuning parameter to zero to examine whether clustering loss alone can resolve community assignments. Our results show that both metrics are needed for the best outcome (Figures 2.8 - 2.10; Table 1.7). Turning off modularity leads to overgrouping of communities 1 and 3 (Figure 2.9), likely because of their similarity along the first principal dimension of the attribute space (Figure 2.10). This suggests that modularity plays a crucial role in separating clusters that might be structurally different but have similar feature patterns.

Do tuning parameter settings hold for more complicated networks? In Example 4, we extend the setup from Example 2 by applying HCD to the more complex fully connected small-world intermediate network to evaluate whether the parameters from the earlier example still hold under more complicated structural relationships. (Table 1.6). As shown in Table 1.8, prediction performance of HCD for top-level communities is lower than in Example 2, which is understandable given the increased structural complexity of this network. Despite this drop in performance, HCD outperforms the Louvain method with a NMI score of \(72.8\%\), compared to Louvain’s \(63.8\%\).

In this example, the Louvain method achieves high homogeneity (\(79.1\%\)) but lower completeness (\(53.4\%\)), indicating that the top-layer communities are being broken into smaller, internally consistent, groups. This imbalance results in a lower overall NMI (see Figure 2.11).

In contrast, HCD achieves a better balance between these two metrics, with homogeneity at \(72.5\%\) and completeness at \(73.2\%\) (Figure 2.12; Table 1.8). This balance contributes to HCD’s higher overall NMI. However, the reduced NMI for HCD compared to its performance in Example 2 is largely due to several misallocated nodes. This misallocation is likely caused by greater heterogeneity in the top-layer communities within this more complex network, as illustrated by the PCA presented in Figure 2.13.

Similarly, in Example 5, we replicate the setup from Example 3 - turning off modularity - while applying HCD to the same fully connected small-world network. Table 1.8 shows that turning off modularity results in reduced performance, with a slightly lower NMI score of \(69.2\%\). This further validates our earlier finding that modularity is required to achieve optimal results.

Should parent (origin) nodes be simulated from the same or different distributions?

When simulating the networks for the datasets outlined in the section ** we use the hierarchy to generate the nodes at the bottom/observed layer under a specific directed graph structure (small world, scale free, random graph). This allows for nodes in the bottom layer to be traced through the levels of the hierarchy to their respective communities in each level. The simualted gene expression for the nodes in the observed layer is then generated using the topological order of the network relating the nodes at this level. Under this framework, nodes which have no parents are generated first from a normal distribution with zero mean and fixed variance

\[ N_{i,k} \sim N(0, \sigma) \]

Here \(N_{i,k}\) is the \(i^{th}\) parent node in the \(k^{th}\) middle layer communitity. This framework causes all node communities to have some similarity because they all originate from the same parents. This is clearly illustrated in PC space where all communities diverge from a central location (Figure 2.14). This similarity amongst communities may underly poorer performance for HCD, especially on more complicated networks. Therefore in Examples 6-7, we test the performance of HCD on the disconnect and fully connected small world networks when the networks are simulated by generating a different distribution for each parent node

\[ N_{i,k} \sim N(\mu_k, \sigma) \]

Simulating the hierarchy under this framework allows the parent nodes belonging to each community to have a unique distribution and corresponds to greater separability along the first principle dimension of the attribute space Figure 2.15.

Despite greater separation along the first principle dimension, HCD performance was much worse for the disconnected small world network simulated in this way (Table 1.9). As can be seen in Figure 2.16, HCD produces a result similar to what we experienced in Example 3 where communities at the top layer are grouped into super-communities leading to higher comleteness (\(84\%\)) but much lower homogenity (\(51.7\%\)) when compared to the Louvain method applied to the same network (homogenity \(100\%\), completeness \(70.5\%\)). This in part could be due to the parameter settings used previously being no longer applicable because the nature of how the data are simualted has been changed. This is supported by larger absolute loss for the clustering component - potentially washing out the graph reconstruction loss. This suggests this component may need to be downweighted further. An alternative explanation is that the model does not have enough learning capability to capture the communities correctly. We explore both possibilities further:

  • Clustering Loss Downweighted Further - In Example 7, we downwight the clustering loss even further so that the clustering loss for the middle layer is \(\lambda_{middle} = 0.0001\) and for the top layer is \(\lambda_{top} = 0.00001\). As shown in Table 1.9, this greatly improved performance for HCD with NMI now comparamble to Louvain method applied to the same network.

  • Clustering Loss Downweighted Further \(+\) Expanded Learning Capacity - In Example 8, we use downweighted clustering loss and expand model learning capacity to see if performance can be improved further.

0.0.1 Applications to scale free and random graph networks

0.0.2 Using Approximate Graphs as Input

1 Tables

Table 1.1: Summary statistics for all small world networks in the complex networks datset
Subgraph type Connect. type Layers StDev. Nodes per layer Edges per layer Subgraph prob. Sample size Modularity (top) Avg. node degree top Avg edges within communities (top) Avg. edges between communities (top) Modularity (middle) Avg. node degree middle Avg edges within communities (middle) Avg edges between communities (middle)
small world disc 2 0.1 (10, 63) (0, 63) 0.05 500 0.90 1.00 6.3 0.00 NA NA NA NA
small world disc 2 0.5 (10, 63) (0, 63) 0.05 500 0.90 1.00 6.3 0.00 NA NA NA NA
small world disc 3 0.1 (10, 63, 1604) (0, 63, 2011) 0.05 500 0.90 1.25 201.1 0.00 0.76 1.25 24.82 0.11
small world disc 3 0.5 (10, 63, 1604) (0, 63, 2031) 0.05 500 0.90 1.27 203.1 0.00 0.76 1.27 24.97 0.12
small world full 2 0.1 (10, 63) (45, 115) 0.05 500 0.45 1.82 6.3 0.58 NA NA NA NA
small world full 2 0.5 (10, 63) (45, 109) 0.05 500 0.48 1.73 6.3 0.51 NA NA NA NA
small world full 3 0.1 (10, 63, 1604) (45, 114, 1604) 0.05 500 0.77 1.43 199.3 3.39 0.67 1.43 24.94 0.19
small world full 3 0.5 (10, 63, 1604) (45, 111, 1604) 0.05 500 0.77 1.44 201.3 3.28 0.66 1.44 24.87 0.19
Table 1.2: Summary statistics for all scale free networks in the complex networks datset
Subgraph type Connect. type Layers StDev. Nodes per layer Edges per layer Subgraph prob. Sample size Modularity (top) Avg. node degree top Avg edges within communities (top) Avg. edges between communities (top) Modularity (middle) Avg. node degree middle Avg edges within communities (middle) Avg edges between communities (middle)
scale free disc 2 0.1 (10, 58) (0, 74) 0.05 500 0.89 1.28 7.4 0.00 NA NA NA NA
scale free disc 2 0.5 (10, 58) (0, 74) 0.05 500 0.89 1.28 7.4 0.00 NA NA NA NA
scale free disc 3 0.1 (10, 58, 1450) (0, 74, 6700) 0.05 500 0.89 4.62 670.0 0.00 0.91 4.62 107.07 0.15
scale free disc 3 0.5 (10, 58, 1450) (0, 74, 6670) 0.05 500 0.89 4.60 667.0 0.00 0.91 4.60 107.07 0.14
scale free full 2 0.1 (10, 58) (45, 120) 0.05 500 0.51 2.07 7.4 0.51 NA NA NA NA
scale free full 2 0.5 (10, 58) (45, 120) 0.05 500 0.51 2.07 7.4 0.51 NA NA NA NA
scale free full 3 0.1 (10, 58, 1450) (45, 123, 1450) 0.05 500 0.85 4.78 665.9 3.03 0.88 4.78 107.07 0.22
scale free full 3 0.5 (10, 58, 1450) (45, 122, 1450) 0.05 500 0.85 4.84 671.4 3.42 0.86 4.84 107.07 0.25
Table 1.3: Summary statistics for all random graph networks in the complex networks datset
Subgraph type Connect. type Layers StDev. Nodes per layer Edges per layer Subgraph prob. Sample size Modularity (top) Avg. node degree top Avg edges within communities (top) Avg. edges between communities (top) Modularity (middle) Avg. node degree middle Avg edges within communities (middle) Avg edges between communities (middle)
random graph disc 2 0.1 (10, 45) (0, 32) 0.05 500 0.88 0.71 3.2 0.00 NA NA NA NA
random graph disc 2 0.5 (10, 45) (0, 32) 0.05 500 0.88 0.71 3.2 0.00 NA NA NA NA
random graph disc 3 0.1 (10, 45, 725) (0, 32, 678) 0.05 500 0.89 0.94 67.8 0.00 0.78 0.94 12.16 0.07
random graph disc 3 0.5 (10, 45, 725) (0, 32, 665) 0.05 500 0.88 0.92 66.5 0.00 0.80 0.92 12.22 0.06
random graph full 2 0.1 (10, 45) (45, 77) 0.05 500 0.31 1.71 3.2 0.50 NA NA NA NA
random graph full 2 0.5 (10, 45) (45, 77) 0.05 500 0.31 1.71 3.2 0.50 NA NA NA NA
random graph full 3 0.1 (10, 45, 725) (45, 78, 725) 0.05 500 0.76 1.04 65.5 1.10 0.70 1.04 12.18 0.10
random graph full 3 0.5 (10, 45, 725) (45, 78, 725) 0.05 500 0.72 1.09 65.7 1.48 0.67 1.09 12.16 0.12
Table 1.4: Summary statistics for intermediate difficulty simulated networks.
Subgraph type Connect. type Layers StDev. Nodes per layer Edges per layer Subgraph prob. Sample size Modularity (top) Avg. node degree top Avg edges within communities (top) Avg. edges between communities (top) Modularity (middle) Avg. node degree middle Avg edges within communities (middle) Avg edges between communities (middle)
small world disc 3 0.1 (5, 15, 300) (0, 15, 354) 0.05 500 0.80 1.18 70.8 0.0 0.78 1.18 20.00 0.26
small world full 3 0.1 (5, 15, 300) (10, 25, 300) 0.05 500 0.72 1.34 73.6 1.7 0.68 1.34 20.00 0.49
scale free disc 3 0.1 (5, 15, 300) (0, 10, 966) 0.05 500 0.78 3.22 193.2 0.0 0.87 3.22 61.33 0.22
scale free full 3 0.1 (5, 15, 300) (10, 20, 300) 0.05 500 0.75 3.32 193.2 1.5 0.84 3.32 61.33 0.36
random graph disc 3 0.1 (5, 12, 167) (0, 7, 133) 0.05 500 0.79 0.80 26.6 0.0 0.79 0.80 9.67 0.13
random graph full 3 0.1 (5, 12, 167) (10, 17, 167) 0.05 500 0.66 0.89 26.0 0.9 0.70 0.89 9.67 0.24
Table 1.5: Summary statistics for simple simulated networks. These networks contain fewer than 100 nodes at the observed level and only cover small world subgraph architecture.
Subgraph type Connect. type Layers StDev. Nodes per layer Edges per layer Subgraph prob. Sample size Modularity (top) Avg. node degree top Avg edges within communities (top) Avg. edges between communities (top) Modularity (middle) Avg. node degree middle Avg edges within communities (middle) Avg edges between communities (middle)
small world disc 2 0.1 (2, 6) (0, 6) 0.05 500 0.50 1.00 3 0.0 NA NA NA NA
small world disc 3 0.1 (2, 6, 18) (0, 6, 24) 0.05 500 0.50 1.33 12 0.0 0.58 1.33 3 0.20
small world full 2 0.1 (2, 6) (1, 7) 0.05 500 0.36 1.17 3 0.5 NA NA NA NA
small world full 3 0.1 (2, 6, 18) (1, 7, 18) 0.05 500 0.46 1.39 12 0.5 0.55 1.39 3 0.23
Table 1.6: HCD settings for Examples 1 - 5.
Parameter Example 1 Examples 2, 4, 6 Examples 3 Example 4 Example 5
Hidden Layer Dimensions: [256, 128, 64, 32] [256, 128, 64, 32] [256, 128, 64, 32] [512, 256, 128, 64, 32] [512, 256, 128, 64, 32]
X_loss: 1 0.0001 0.0001 0.0001 0.0001
Modularity_loss: 0.001 0.001 0 0.001 0
Clustering_loss: [0.001, 0.0001] [0.001, 0.0001] [0.001, 0.0001] [0.001, 0.0001] [0.001, 0.0001]
Learning_rate: 1e-05 1e-05 1e-05 1e-05 1e-05
True Comms: True True True True True
Epochs: 500 500 500 500 500
Multi Head Attn: True True True True True
Use Attention: True True True True True
Attn. Heads: 10 10 10 20 20
Normalize_inputs: True True True True True
Input Graph True Graph True Graph True Graph True Graph True Graph
Network 3layer-Disc-SMW 3layer-Disc-SMW 3layer-Disc-SMW 3layer-Full-SMW 3layer-Full-SMW
Dataset Intermediate Networks Intermediate Networks Intermediate Networks Intermediate Networks Intermediate Networks
Table 1.7: Performance metrics for Louvain method and HCD in Examples 1 - 3. For details regarding parameter settings see Examples 1 - 3 under section Examples. All results are for the disconnected small-world intermediate network.
Example 1
Example 2
Example 3
Comparison Homogeneity Completeness NMI Homogeneity Completeness NMI Homogeneity Completeness NMI
louvain vs middle 0.665 0.843 0.743 0.661 0.820 0.732 0.680 0.874 0.765
louvain vs top 1.000 0.754 0.860 1.000 0.737 0.849 1.000 0.764 0.866
HCD: middle vs middle 0.497 0.576 0.533 0.660 0.704 0.682 0.568 0.584 0.576
HCD: middle vs top 0.608 0.418 0.495 0.966 0.612 0.749 0.790 0.482 0.599
HCD: top vs top 0.418 1.000 0.590 0.953 0.954 0.953 0.762 0.823 0.791
Table 1.8: Performance metrics for Louvain method and HCD in Examples 4-5. For details regarding parameter settings see Examples 4-5 under section Examples. All results are for the fully connected small-world intermediate network.
Example 4
Example 5
Comparison Homogeneity Completeness NMI Homogeneity Completeness NMI
louvain vs middle 0.621 0.705 0.660 0.617 0.756 0.679
louvain vs top 0.791 0.534 0.638 0.819 0.597 0.690
HCD: middle vs middle 0.568 0.632 0.598 0.547 0.608 0.576
HCD: middle vs top 0.776 0.513 0.618 0.748 0.493 0.594
HCD: top vs top 0.725 0.732 0.728 0.686 0.698 0.692
Table 1.9: Performance metrics for Louvain method and HCD in Examples 6-7. For details regarding parameter settings see Examples 4-5 under section Examples. All results are for the fully connected small-world intermediate network.
Example 6
Example 7
Comparison Homogeneity Completeness NMI Homogeneity Completeness NMI
louvain vs middle 0.719 0.853 0.780 0.685 0.802 0.739
louvain vs top 1.000 0.705 0.827 1.000 0.696 0.821
HCD: middle vs middle 0.460 0.483 0.471 0.617 0.663 0.639
HCD: middle vs top 0.565 0.352 0.434 0.809 0.516 0.630
HCD: top vs top 0.518 0.840 0.640 0.795 0.824 0.809

2 Figures

The true and reconstructed adjacency matrix for Example 1 application of HCD to the disconnected small-world intermediate network. Left: the reconstructed adjacency matrix after 500 training epochs. Right: The input graph and true adjacency matrix.

Figure 2.1: The true and reconstructed adjacency matrix for Example 1 application of HCD to the disconnected small-world intermediate network. Left: the reconstructed adjacency matrix after 500 training epochs. Right: The input graph and true adjacency matrix.

The middle and top layer community predictions corresponding to Example 1 - application of HCD to disconnected small-world intermeidate network.

Figure 2.2: The middle and top layer community predictions corresponding to Example 1 - application of HCD to disconnected small-world intermeidate network.

Example 1 training loss curves.

Figure 2.3: Example 1 training loss curves.

A plot of the nodes in the first three prinicple dimensions of the attribute space. Colors correspond to the predicted top layer community assignments corresponding the application in Example 1 for the "best" performing epoch.

Figure 2.4: A plot of the nodes in the first three prinicple dimensions of the attribute space. Colors correspond to the predicted top layer community assignments corresponding the application in Example 1 for the “best” performing epoch.

Example 2 training loss curves.

Figure 2.5: Example 2 training loss curves.

The true and reconstructed adjacency matrix for Example 2 application of HCD to the disconnected small-world intermediate network. Left: the reconstructed adjacency matrix after 500 training epochs. Right: The input graph and true adjacency matrix.

Figure 2.6: The true and reconstructed adjacency matrix for Example 2 application of HCD to the disconnected small-world intermediate network. Left: the reconstructed adjacency matrix after 500 training epochs. Right: The input graph and true adjacency matrix.

The middle and top layer community predictions corresponding to Example 2 - application of HCD to disconnected small-world intermeidate network with downweighted attribute reconstruction loss

Figure 2.7: The middle and top layer community predictions corresponding to Example 2 - application of HCD to disconnected small-world intermeidate network with downweighted attribute reconstruction loss

The true and reconstructed adjacency matrix for Example 3 - application of HCD to the disconnected small-world intermediate network. Left: the reconstructed adjacency matrix after 500 training epochs. Right: The input graph and true adjacency matrix.

Figure 2.8: The true and reconstructed adjacency matrix for Example 3 - application of HCD to the disconnected small-world intermediate network. Left: the reconstructed adjacency matrix after 500 training epochs. Right: The input graph and true adjacency matrix.

The middle and top layer community predictions corresponding to Example 3 - application of HCD to disconnected small-world intermeidate network with downweighted attribute reconstruction loss

Figure 2.9: The middle and top layer community predictions corresponding to Example 3 - application of HCD to disconnected small-world intermeidate network with downweighted attribute reconstruction loss

A plot of the nodes in the first three prinicple dimensions of the attribute space. Colors correspond to the predicted top layer community assignments corresponding the application in Example 1 for the "best" performing epoch.

Figure 2.10: A plot of the nodes in the first three prinicple dimensions of the attribute space. Colors correspond to the predicted top layer community assignments corresponding the application in Example 1 for the “best” performing epoch.

Louvain community predictions compared to the true middle and top layer communities for the fully connected small-world intermediate network

Figure 2.11: Louvain community predictions compared to the true middle and top layer communities for the fully connected small-world intermediate network

The middle and top layer community predictions corresponding to Example 4 - application of HCD to the fully-connected, small-world intermeidate network.

Figure 2.12: The middle and top layer community predictions corresponding to Example 4 - application of HCD to the fully-connected, small-world intermeidate network.

A plot of the nodes in the first three prinicple dimensions of the attribute space for the fully connected, small-world intermeidate netowrk. Colors correspond to the predicted top layer community assignments corresponding the application in Example 4 for the "best" performing epoch.

Figure 2.13: A plot of the nodes in the first three prinicple dimensions of the attribute space for the fully connected, small-world intermeidate netowrk. Colors correspond to the predicted top layer community assignments corresponding the application in Example 4 for the “best” performing epoch.

A plot of the nodes in the first three prinicple dimensions of the attribute space for the disconnected, small-world intermeidate netowrk. Colors correspond to the true top layer communities. In this network, orign nodes are simulated from the same distribution resulting in all communities originating from a central point in the PC space. Red points represent parent nodes

Figure 2.14: A plot of the nodes in the first three prinicple dimensions of the attribute space for the disconnected, small-world intermeidate netowrk. Colors correspond to the true top layer communities. In this network, orign nodes are simulated from the same distribution resulting in all communities originating from a central point in the PC space. Red points represent parent nodes

A plot of the nodes in the first three prinicple dimensions of the attribute space for the disconnected, small-world intermeidate network. In this network, orign nodes are simulated from a unique distribution resulting in all communities originating at distinct points in the PC space. Red points represent parent nodes

Figure 2.15: A plot of the nodes in the first three prinicple dimensions of the attribute space for the disconnected, small-world intermeidate network. In this network, orign nodes are simulated from a unique distribution resulting in all communities originating at distinct points in the PC space. Red points represent parent nodes

A plot of the nodes in the first three prinicple dimensions of the attribute space for the fully connected, small-world intermeidate netowrk. Colors correspond to the predicted top layer community assignments corresponding the application in Example 6 for the "best" performing epoch.

Figure 2.16: A plot of the nodes in the first three prinicple dimensions of the attribute space for the fully connected, small-world intermeidate netowrk. Colors correspond to the predicted top layer community assignments corresponding the application in Example 6 for the “best” performing epoch.

Example 6 training loss curves.

Figure 2.17: Example 6 training loss curves.

3 References

Barabási, Albert-László, and Eric Bonabeau. 2003. “Scale-Free Networks.” Scientific American 288 (5): 60–69.
Salehi, Amin, and Hasan Davulcu. 2019. “Graph Attention Auto-Encoders.” arXiv Preprint arXiv:1905.10715.
Veličković, Petar, Guillem Cucurull, Arantxa Casanova, Adriana Romero, Pietro Lio, and Yoshua Bengio. 2017. “Graph Attention Networks.” arXiv Preprint arXiv:1710.10903.
Watts, Duncan J, and Steven H Strogatz. 1998. “Collective Dynamics of ‘Small-World’networks.” Nature 393 (6684): 440–42.